Generalising Lexical Translation Strategies for MT Using Comparable Corpora

نویسندگان

  • Bogdan Babych
  • Serge Sharoff
  • Anthony Hartley
چکیده

We report on an on-going research project aimed at increasing the range of translation equivalents which can be automatically discovered by MT systems. The methodology is based on semi-supervised learning of indirect translation strategies from large comparable corpora and their application in run-time to generate novel, previously unseen translation equivalents. This approach is different from methods based on parallel resources, which currently can reuse only individual translation equivalents. Instead it models translation strategies which generalise individual equivalents and can successfully generate an open class of new translation solutions. The end goal of the project is integration of the developed technology into open-source MT systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Social Practice of Persian Translations of ‘The Girl You Left Behind’ through Translators’ Lexical and Grammatical Strategies

The present study aimed to shed light upon the differences of social practice of Persian translations of The Girl You Left Behind written by Jojo Moyes (2012) with original text in English based on Fairclough's (1995) model. In this regard, through a careful analysis of the source and target texts, English social prac- tice instances were selected along with their Persian equivalents as the cor...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Parallel Sentence Extraction from Comparable Corpora with Neural Network Features

Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improveme...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrime...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008